AITopics | temporal action segmentation

ee6c4b99b4c0d3d60efd22c1ecdd9891-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 20:22:24 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
(2 more...)

Genre:

Workflow (0.51)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Activity Grammars for Temporal Action Segmentation

Neural Information Processing SystemsDec-27-2025, 03:49:59 GMT

Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties of parts. The task of temporal action segmentation remains challenging for the reason, aiming at translating an untrimmed activity video into a sequence of action segments. This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm, dubbed KARI, that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser, dubbed BEP, that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules. Our approach can be combined with any neural network for temporal action segmentation to enhance the sequence prediction and discover its compositional structure. Experimental results demonstrate that our method significantly improves temporal action segmentation in terms of both performance and interpretability on two standard benchmarks, Breakfast and 50 Salads.

activity grammar, name change, temporal action segmentation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

Neural Information Processing SystemsDec-24-2025, 07:34:58 GMT

We propose Differentiable Temporal Logic (DTL), a model-agnostic framework that introduces temporal constraints to deep networks. DTL treats the outputs of a network as a truth assignment of a temporal logic formula, and computes a temporal logic loss reflecting the consistency between the output and the constraints. We propose a comprehensive set of constraints, which are implicit in data annotations, and incorporate them with deep networks via DTL. We evaluate the effectiveness of DTL on the temporal action segmentation task and observe improved performance and reduced logical errors in the output of different task models. Furthermore, we provide an extensive analysis to visualize the desirable effects of DTL.

differentiable temporal logic, pour cereal, temporal action segmentation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

M2R2: MultiModal Robotic Representation for Temporal Action Segmentation

Sliwowski, Daniel, Lee, Dongheui

arXiv.org Artificial IntelligenceNov-25-2025

Temporal action segmentation (TAS) has long been a key area of research in both robotics and computer vision. In robotics, algorithms have primarily focused on leveraging proprioceptive information to determine skill boundaries, with recent approaches in surgical robotics incorporating vision. In contrast, computer vision typically relies on exteroceptive sensors, such as cameras. Existing multimodal TAS models in robotics integrate feature fusion within the model, making it difficult to reuse learned features across different models. Meanwhile, pretrained vision-only feature extractors commonly used in computer vision struggle in scenarios with limited object visibility. In this work, we address these challenges by proposing M2R2, a multimodal feature extractor tailored for TAS, which combines information from both proprioceptive and exteroceptive sensors. We introduce a novel pretraining strategy that enables the reuse of learned features across multiple TAS models. Our method achieves state-of-the-art performance on the REASSEMBLE dataset, a challenging multimodal robotic assembly dataset, outperforming existing robotic action segmentation models by 46.6%. Additionally, we conduct an extensive ablation study to evaluate the contribution of different modalities in robotic TAS tasks.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2504.18662

Country: Europe > Austria (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

ee6c4b99b4c0d3d60efd22c1ecdd9891-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 11:12:10 GMT

action sequence, grammar, probability, (11 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
(2 more...)

Genre:

Workflow (0.51)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

Neural Information Processing SystemsMay-27-2025, 06:23:57 GMT

We propose Differentiable Temporal Logic (DTL), a model-agnostic framework that introduces temporal constraints to deep networks. DTL treats the outputs of a network as a truth assignment of a temporal logic formula, and computes a temporal logic loss reflecting the consistency between the output and the constraints. We propose a comprehensive set of constraints, which are implicit in data annotations, and incorporate them with deep networks via DTL. We evaluate the effectiveness of DTL on the temporal action segmentation task and observe improved performance and reduced logical errors in the output of different task models. Furthermore, we provide an extensive analysis to visualize the desirable effects of DTL.

artificial intelligence, differentiable temporal logic, temporal action segmentation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Activity Grammars for Temporal Action Segmentation

Neural Information Processing SystemsJan-20-2025, 01:54:40 GMT

Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties of parts. The task of temporal action segmentation remains challenging for the reason, aiming at translating an untrimmed activity video into a sequence of action segments. This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm, dubbed KARI, that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser, dubbed BEP, that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules.

activity grammar, compositional structure, temporal action segmentation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

Neural Information Processing SystemsOct-11-2024, 07:07:56 GMT

We propose Differentiable Temporal Logic (DTL), a model-agnostic framework that introduces temporal constraints to deep networks. DTL treats the outputs of a network as a truth assignment of a temporal logic formula, and computes a temporal logic loss reflecting the consistency between the output and the constraints. We propose a comprehensive set of constraints, which are implicit in data annotations, and incorporate them with deep networks via DTL. We evaluate the effectiveness of DTL on the temporal action segmentation task and observe improved performance and reduced logical errors in the output of different task models. Furthermore, we provide an extensive analysis to visualize the desirable effects of DTL.

differentiable temporal logic, pour cereal, temporal action segmentation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach

Tanaka, Ryota, Suzuki, Tomohiro, Fujii, Keisuke

arXiv.org Artificial IntelligenceAug-29-2024

Understanding human actions from videos is essential in many domains, including sports. In figure skating, technical judgments are performed by watching skaters' 3D movements, and its part of the judging procedure can be regarded as a Temporal Action Segmentation (TAS) task. TAS tasks in figure skating that automatically assign temporal semantics to video are actively researched. However, there is a lack of datasets and effective methods for TAS tasks requiring 3D pose data. In this study, we first created the FS-Jump3D dataset of complex and dynamic figure skating jumps using optical markerless motion capture. We also propose a new fine-grained figure skating jump TAS dataset annotation method with which TAS models can learn jump procedures. In the experimental results, we validated the usefulness of 3D pose features as input and the fine-grained dataset for the TAS model in figure skating. FS-Jump3D Dataset is available at https://github.com/ryota-skating/FS-Jump3D.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3689061.3689077

2408.16638

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > Victoria > Melbourne (0.05)
Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)
(5 more...)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment > Sports > Figure Skating (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.93)

Add feedback

How Much Temporal Long-Term Context is Needed for Action Segmentation?

Bahrami, Emad, Francesca, Gianpiero, Gall, Juergen

arXiv.org Artificial IntelligenceSep-25-2023

Modeling long-term context in videos is crucial for many fine-grained tasks including temporal action segmentation. An interesting question that is still open is how much long-term temporal context is needed for optimal performance. While transformers can model the long-term context of a video, this becomes computationally prohibitive for long videos. Recent works on temporal action segmentation thus combine temporal convolutional networks with self-attentions that are computed only for a local temporal window. While these approaches show good results, their performance is limited by their inability to capture the full context of a video. In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video. We compare our model with the current state of the art on three datasets for temporal action segmentation, namely 50Salads, Breakfast, and Assembly101. Our experiments show that modeling the full context of a video is necessary to obtain the best performance for temporal action segmentation.

action segmentation, segmentation, video, (12 more...)

arXiv.org Artificial Intelligence

2308.11358

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Belgium (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Filters

Collaborating Authors

temporal action segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ee6c4b99b4c0d3d60efd22c1ecdd9891-Paper-Conference.pdf

Activity Grammars for Temporal Action Segmentation

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

M2R2: MultiModal Robotic Representation for Temporal Action Segmentation

ee6c4b99b4c0d3d60efd22c1ecdd9891-Paper-Conference.pdf

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

Activity Grammars for Temporal Action Segmentation

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach

How Much Temporal Long-Term Context is Needed for Action Segmentation?